Visual Contribution to Word Prominence Detection in a Playful Interaction Setting

نویسنده

  • Martin Heckmann
چکیده

Realization of focus Speakers produced words differently in the two focus conditions Acoustic features indicate high word prominence Classification Experiments Discrimination of two focus classes with ~65% correct for individual features (acoustic or visual) Exception f0: 65-83% (depending on speaker) Exception nose features: at chance level Significant AV gain when combining energy and FFT or DCT (~63% ~69%) Significant gain when combining f0 or all acoustic cues with FFT or DCT only for speaker B where f0 was weaker (e.g. 79% 86%)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario

Modifying the articulatory parameters to raise the prominence of a segment of an utterance (hyperarticulating) is usually accompanied by a reduction of these parameters (hypoarticulation) for the neighboring segments. In this paper we investigate different approaches for the automatic labeling of the prominence of words. In particular, we investigate how the information in the sequence can be u...

متن کامل

Facial expression and prosodic prominence: Effects of modality and facial area

This article addresses two related questions regarding the perception of facial markers of prominence in spoken utterances: (1) how important are visual cues to prominence from the face with respect to auditory cues? and (2) are there differences between different facial areas in their cue value for prosodic prominence? The first perception experiment tackles the relation between auditory and v...

متن کامل

Steps Towards More Natural Human-Machine Interaction via Audio-Visual Word Prominence Detection

We investigate how word prominence can be detected from the acoustic signal and movements of the speaker’s head and mouth. Our research is based on a corpus with 12 English speakers which contains in addition to the speech signal also videos of the talker’s head. To extract the word prominence information we use on one hand functionals calculated on the features and on the other hand Functional...

متن کامل

Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario

This paper investigates the audio-visual correlates and the detection of word prominence. Subjects were interacting with a computer in a small game which created a broad and a narrow focus condition. Audio-visual recordings with a distant microphone and without visual markers were made. As acoustic features duration, intensity, fundamental frequency and spectral emphasis were calculated. From t...

متن کامل

Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection

Common fusion techniques in audio-visual speech processing operate on the modality level. I.e. they either combine the features extracted from the two modalities directly or derive a decision for each modality separately and then combine the modalities on the decision level. We investigate the audio-visual processing of linguistic prosody, more precisely the extraction of word prominence. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014